Classification of scale free networks 
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While the emergence of a power law degree distribution in complex networks is intriguing, the 
degree exponent is not universal. Here we show that the betweenness centrality displays a 
power-law distribution with an exponent r\ which is robust and use it to classify the scale-free 
networks. We have observed two universality classes with r\ fa 2.2(1) and 2.0, respectively. Real 
world networks for the former are the protein interaction networks, the metabolic networks for 
eukaryotes and bacteria, and the co-authorship network, and those for the latter one are the 
Internet, the world-wide web, and the metabolic networks for archaea. Distinct features of the 
mass-distance relation, generic topology of geodesies and resilience under attack of the two classes 
are identified. Various model networks also belong to either of the two classes while their degree 
exponents are tunable. 



Emergence of a power law in the degree distribution 
Po{k) ~ k~ 7 in complex networks is an interesting self- 
organized phenomenon in complex systems ([j]; |^). 
Here, the degree k means the number of edges incident 
upon a given vertex. Such a network is called scale-free 
(SF) (Q). Real world networks which are SF include the 
author collaboration network (||) in social systems, the 
protein interaction network (PIN) (|^) and the metabolic 
network ((?]) in biological systems, and the Internet (||) 
and the world-wide web (WWW) (|; [[(]) m communica- 
tion systems. The power-law behavior means that most 
vertices are sparsely connected, while a few vertices are 
intensively connected to many others and play an impor- 
tant role in functionality. While the emergence of such 
a SF behavior in degree distribution itself is surprising, 
the degree exponent 7 is not universal and depends on 
the detail of network structure. As listed in Table 1, nu- 
merical values of the exponent 7 for various systems are 
diverse but most of them are in the range 2 < 7 < 3. 
From the viewpoint of theoretical physics, it would be 
interesting to search a universal quantity associated with 
SF networks. 

Recently a physical quantity called "load" was intro- 
duced as a candidate for the universal quantity in SF 
networks. It quantifies the load of a vertex in the trans- 
port of data packet along the shortest pathways in SF 
networks (11). It was shown that the load distribution 
exhibits a power law, Pl{£) ~ £~ 5 , and the exponent S 
is robust as S sa 2.2 for diverse SF networks with var- 
ious degree exponents in the range 2 < 7 < 3. Since 
the universal behavior of the load exponent was obtained 
empirically, fundamental questions such as how the load 
exponent is robust in association with network topology 
or the possibility of any other universal classes existing, 
have not been explored yet. In this paper, we address 
those issues in detail. 

While the load is a dynamic quantity, it is closely 
related to a static quantity, the "betweenness central- 
ity (BC)" , commonly used in sociology to quantify how 
much a given person is influential in a society (|12|). To 



be specific, BC is defined as follows. Let us consider 
the set of the shortest pathways, or geodesies, between a 
pair of vertices and denote their number by C(i,j). 
Among them, the number of the shortest pathways run- 
ning through a vertex k is denoted by Ck(i,j)- The frac- 
tion <7fc(z, j) = Ck(i, j) / C{i, j) may be interpreted as the 
amount of the role played by the vertex k in social rela- 
tion between two persons i and j. Then the BC of the 
vertex k is defined as the accumulated sum of gk(i,j) 
over all ordered pairs for which a geodesic exists, i.e., 
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Because of only slight difference between load and BC, 
both quantities behave very closely. In fact, the BC gt of 
each vertex is exactly the same as the load for tree graphs. 
In general, distributions of the two are indistinguishable 
within available resolutions. The BC distribution follows 
a power law, 
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where g means BC and the exponent 77 is the same as 
the load exponent 5. Since the topological feature of a 
network can be grasped more transparently using BC, we 
deal with BC in this work. 

Based on numerical measurements of the BC exponent 
for a variety of SF networks, we find that SF networks 
can be classified into only two classes, say, class I and II. 
For the class I, the BC exponent is rj ss 2.2(1) and for the 
class II, it is 77 « 2.0(1). We conjecture the BC exponent 
for the class II to be exactly 77 = 2 since it can be derived 
analytically for simple models. We show that such differ- 
ent universal behaviors in the BC distribution originate 
from different generic topological features of networks. 
Moreover, we study a physical problem, the resilience of 
networks under an attack, showing different behaviors for 
each class, as a result of such difference in generic topolo- 
gies. It is found that the networks of class II are much 
more vulnerable to the attack than those for class I. 
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To obtain our results, we use available data for 
real world networks or existing algorithms for model 
networks. Once a SF network is constructed, we select 
a pair of vertices on the network and identify the 

shortest pathways between them. Next, BC is measured 
on each vertex along the shortest pathways using the 
modified version of the breath-first search algorithm 
introduced by Newman (|l3|). We measure the BC 
distribution and the exponent 77 for a variety of networks 
both in real world and in silico. 

Real World and Artificial Networks Investigated 

The networks that we find to belong to the class I with 
7/ = 2.2(1) include: (i) The co-authorship network in the 
field of the neuroscience, published in the period 1991- 
1998 (|l4|), where vertices represent scientists and they are 
connected if they wrote a paper together, (ii) The protein 
interaction network of the yeast S. cerevisiae compiled 
by Jeong et al. (||) (PIN1), where vertices represent pro- 
teins and the two proteins are connected if they interact 1 , 
(iii) The core of protein interaction network of the yeast 
S. cerevisiae obtained by Ito et al. 2 (PIN2) (15). (iv) 
The metabolic networks for 5 species of eukaryotes and 
32 species of bacteria in Ref. (0) , where vertices represent 
substrates and they are connected if a reaction occurs be- 
tween two substrates via enzymes. The reaction normally 
occurs in one direction, so that the network is directed, 
(v) The Barabasi- Albert (BA) model ( fL6|) when the num- 
ber of incident edges of an incoming vertex to > 2. (vi) 
The geometric growth model by Huberman and Adamic 
(|l0|). (vii) The copying model (|l7|), whose degree expo- 
nent is in the range of 2 < 7 < 3. (viii) The undirected 
or the directed static model ( pi] ) , whose degree exponent 
is in the range of2<7<3or2< (7i n ,7out) < 3, re- 
spectively, (ix) The accelerated growth model proposed 
by Dorogovtsev et al. (|l8|). (x) The fitness model jl9| ) 
with a flat fitness distribution, (xi) The stochastic model 
for the protein interaction networks introduced by Sole 
et al. (|20|). All those networks (i)-(xi) exhibit a power- 
law behavior in the BC distribution with the exponent 
r\ « 2.2(1). Detailed properties of each network are listed 
in Table 1. The representative BC distributions for real 
world networks (i), (iii), and (iv) are shown in Fig. la. 

The networks that we find to belong to the class 
II with r\ = 2.0 include: (xii) The Internet at the 



autonomous systems (AS) level as of October, 2001 (22). 
(xiii) The metabolic networks for 6 species of archaea 



The network is composed of disconnected clusters of different 
sizes, viz., small isolated clusters as well as a giant cluster. For 
both (ii) and (xi), the degree distribution is likely to follow a 
power law but there needs an exponential cutoff to describe its 
tail behavior for finite system. However, it converges to a clean 
power law for (xi) as s ystem size increases, but the converging 
rate is rather slow (J2 1[) . Despite this abnormal behavior in the 
degree distribution for finite system, the BC distribution follows 
a pure power law with the exponent r\ 2.2(1) in (ii) and (xi). 
In contrast to (ii), the degree distribution obeys a power law. 



in Ref.(0). (xiv) The WWW ofnd.edu (|). (xv) The 
BA model with to = 1 (|l6|). (xvi) The deterministic 
model by Jung et al. (|23|). In particular, the networks 
(xv) and (xvi) are of tree structure, where the edge BC 
distribution can be solved analytically. The detailed 
properties of each network are listed in Table 1. The BC 
distributions for real world networks (xii) and (xiv) are 
shown in Fig. lb. Since the BC exponents of each class 
are very close numerically, one may wonder if there exist 
really two different universal classes apart from error bar. 
To make this point clear, we plot the BC distributions 
for the BA model with m = 1, 2 and 3 in Fig. 2, obtained 
from large system size, N = 3 x 10 5 . We can see clearly 
different behaviors between the two BC distributions for 
the cases of m = 1 (class II) and of to = 2 and 3 (class I). 

Topology of the Shortest Pathways 

To understand the generic topological features of the net- 
works in each class, we particularly focus on the topology 
of the shortest pathways between two vertices separated 
by a distance d. Along the shortest pathways, we count 
the total number of vertices M.(d) lying on these roads, 
averaged over all pairs of vertices separated by the same 
distance d. Adopting from the fractal theory, M.(d) is 
called the "mass-distance" relation. We find that it be- 
haves in different ways for each class; For the class I, 
A4(d) behaves nonlinearly (Figs.3a-b), while for the class 
II, it is roughly linear (Fig.3c-d). 

For the networks belonging to the class I such as the 
PIN2 (iii) and the metabolic network for eukaryotes (iv), 
M.(d) exhibits a non-monotonic behavior (Fig.3a-b), viz., 
it exhibits a hump at 4 ~ 10 for (iii) or dh ~ 14 for (iv). 
To understand why such a hump arises, we visualize the 
topology of the shortest pathways between a pair of ver- 
tices, taken from the metabolic network of a eukaryote 
organism, Emericella nidulans (EN), as a prototypical 
example for the class I. Fig. 4a shows such a graph with 
linear size 26 edges (d — 26), where an edge between a 
substrate and an enzyme is taken as the unit of length. 
From Fig. 4a, one can see that there exists a blob struc- 
ture inside which vertices are multiply connected, while 
vertices outside are singly connected. What is character- 
istic for the class I is that the blob is localized in a small 
region. To see this, we measure the mass density m(r; d), 
the average number of substrates or enzymes located at 

position r (X) r =i m ( r J d) — -M(^)) ■ The average is taken 
over all possible pairs of vertices (56 pairs), separated by 
the same distance d = 26. Note that the metabolic net- 
work is directed, so that the position r is uniquely de- 
fined. As shown in Fig. 5, we find that m{r; d) is sharply 
peaked at r = 3, and is larger than 1 only at r = 2,4, and 
6 for substrates. Thus the blob structure is present, even 
after taking averages, and is localized in a small region 
of size db ~ 4^5, centered at almost the same position 
r « 3^4 for different pairs of vertices. The blob size db 
can be measured in another way. In a given graph of the 
shortest pathways, we delete singly connected substrates 
successively until none is left and measure the linear size 
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of the remaining structure. When averaged over all pairs 
of vertices with separations d > 10, it comes out to be 
db w 4.5, well consistent with the value obtained previ- 
ously for d — 26 only. Due to this blob structure, the 
mass-distance relation increases abruptly across d = 4 as 
shown in Fig. 3b. 

Next, we measure the average mass of blob, that is, 
the number of vertices inside a blob for a given graph of 
the shortest pathways with separation d, averaged over 
all pairs of vertices with the same separation. We find 
that the average blob mass is broadly distributed in the 
range 3 < mj, < 23. In particular, relatively heavy blob 
masses, rrib = 15^23, mainly comes from the graphs 
whose linear size is d ~ 8~14. Due to those blobs with 
heavy mass, the mass-distance relation exhibits a hump, 
and decreases at around d = 14^16, beyond which, the 
mass Ai(d) increases linearly by the presence of singly 
connected vertices. In short, the anomalous behavior 
in the mass-distance relation is due to the presence of 
a compact and localized blob structure in the topology 
of the shortest pathways between a pair of vertices for 
the metabolic network of eukaryotes. We have checked 
the mass-distance relations and the graphs of the short- 
est pathways for other networks belonging to the class I 
such as the PIN2 and metabolic networks for other or- 
ganisms, and found that such topological features are 
generic, generating the anomalous behavior in the mass- 
distance relation. It still remains a challenge to derive the 
BC exponent r\ « 2.2 analytically from such structures. 

For the class II, the mass depends on distance linearly, 
Ai(d) ~ Ad for large d (Fig.3c-d). Despite the linear 
dependence, the shortest pathway topology for the case 
of A > 1 is more complicated than that of the simple tree 
structure where A = 1. Therefore, the SF networks in 
the class II are subdivided into two types, called the class 
Ha and lib, respectively. For the class Ha, A > 1 and 
the topology of the shortest pathways includes multiply 
connected vertices (Figs. 4b and 4c), while for the class 
lib, A ~ 1 and the shortest pathway is almost singly 
connected (Fig.4d). Examples in real world networks in 
the class Ha are the Internet at the AS level (A ~ 4.5) 
and the metabolic network for archaea (A ~ 2.0), while 
that in the class lib is the WWW (A - 1.0). 

Let us examine the topological features of the shortest 
pathways for the networks in the class Ha and lib more 
closely. First, for the class Ha, we visualize in Fig. 4b a 
shortest pathway in the Internet system between a pair of 
vertices separated by 10 edges, the farthest separation. 
It contains a blob structure, but the blob is rather ex- 
tended as db = 5, comparable to the maximum separation 
d = 10. We obtain db = 5 for d = 11 for another system. 
For comparison, db ~ 4.5 for d — 26 in the class I. More- 
over, the feature-less mass-position dependence m(r;d) 
we have found implies that while most blobs are located 
almost in the middle of the shortest pathways, which 
seems to be caused by the geometric effect, there are a 
finite number of blobs located at the verge of the short- 
est pathways. Note that m(r; d) = m(d — r; d) since the 



Internet is undirected. Owing to the extended structure 
and the scattered location of the blob, the mass-distance 
relation exhibits the linear behavior, A4(d) ~ Ad with 
A « 4.5. The extended blob structure is also observed 
in the metabolic network for archaea (Fig. 4c). Since the 
network in this case is directed, the symmetry in m(r; d) 
does not hold. However, the blob structure extends to al- 
most one half of maximum separation, and the shortest 
pathways are very diverse, so that their topological prop- 
erty such as the mass-distance relation A4(d) is similar 
to that of the Internet. 

The WWW is an example belonging to the class lib. 
For this network, the mass-distance relation exhibits 
Ai(d) ~ 1.0c?, suggesting that the topology of the 
shortest pathway is almost singly connected, which is 
confirmed in Fig.4d. When a SF network is of tree 
structure, one can solve the distribution of BC running 
through each edge analytically, and obtain the BC 
exponent to be r\ = 2. A derivation of this exact result 
is presented in the Appendix. 

Comparison of the Resilience under Attack 

So far we have investigated the topological features of 
the shortest pathways of SF networks of each class. Then 
what would be distinct physical phenomena originated 
from such different topological features? Associated 
with this question, we investigate a problem of the 
resilience of network under a malicious attack. It is 
known that SF networks are extremely vulnerable to 
the intentional attack to a few vertices with high degree, 
while it is very robust to random failures |25| ). To 
compare how vulnerable a network in each different 
class is under such attacks, we first construct a directed 
network whose numbers of vertices and edges, and 
the degree distribution are identical to those of the 
WWW (xiv), but whose BC exponent is 2.2. It can 
be generated, for example, by following the stochastic 
rule introduced in the directed static model (|IT|). For 
both the WWW in real world and the artificial model 
network, we remove vertices in the descending order 
of BC successively. As vertices are removed, both the 
mean distance (d) between two vertices, known as the 
diameter, and the relative size of the giant cluster S 
are measured as a function of the fraction of removed 
vertices /. As can be seen in Fig. 6a, the diameter of the 
WWW with rj = 2.0 (class lib) increases more rapidly 
than that with 77 = 2.2 (class I) and shows discrete 
jumps while vertices are removed. Also the relative size 
of the largest cluster decreases more rapidly for 77 = 2.0 
than for rj ps 2.2 (Fig. 6b). This behavior arises from the 
fact that the shortest pathway consists of mainly singly 
connected vertices for the class lib, so that there is no 
alternative pathways with the same distance when a 
single vertex lying on the shortest pathway is removed. 
For the Internet in real world with 77 ps 2.0 in the class 
Ha and an artificial network with 77 w 2.2 with the same 
numbers of vertices and edges and the identical degree 
distribution, the differences in the diameter (d) and in 
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the relative size S of the largest cluster appear to be 
rather small (Figs.6c-d), in comparison to the case of 
the WWW (Figs.6a-b). This is because the shortest 
pathways are multiply connected for the class Ha. 

Conclusions 

In conclusion, we have found that the betweenness 
centrality can determine the universal behavior of 
SF networks. By examining a variety of real world 
and artificial SF networks, we have observed two 
distinct universality classes whose BC exponents are 
7] ~ 2.2(1) (class I) and 2.0 (class II), respectively. The 
mass-distance relation is introduced to characterize 
the topological features of the shortest pathways. It 
shows a hump for the class I networks due to compact 
and localized blobs in the shortest pathway topologies, 
while it is roughly linear for the class II ones which are 
more or less tree-like. The class II networks can further 
be divided into two types depending on whether the 
shortest pathway topology contains diversified pathways 
(class Ha) or mostly singly connected ones (class lib). 
Distinct features of the resilience under attack arising 
from the different topologies of the shortest pathways 
are also identified. Since SF networks show the small 
world property, the topology of the shortest pathways 
should be of relevance for characterizing the network 
geometry. Indeed the mass-distance relations for dif- 
ferent universality classes show different behaviors. 
Such a relation between the universality class and the 
topological features of the shortest pathways may be 
understood from the perspective of the fact that the 
geometric fractal structure of the magnetic domains in 
equilibrium spin systems at criticality can classify the 
universality classes. Further characterizations in static 
and dynamic properties and possible evolutionary origin 
of the universality classes are interesting questions left 
for future study. 
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between the vertex BC and the edge BC distributions 
for a deterministic model of scale-free tree introduced by 
Jung et al. (p3|), which will be published elsewhere. 

We consider a growing tree network such as the BA 
type model with to = 1, where a newly introduced vertex 
attaches an edge to an already existing vertex j with the 
probability proportional to its degree as (kj +a) / ^2Ake+ 
a). Then the network consists of N(t) — t + 1 vertices 
and L(t) — t edges at time t. The stationary degree 
distribution is of a power law with 7 = 3 + a (|2^; p7[ ). 
Each edge of a tree divides the vertices into two groups 
attached to either sides of the edge. Let P s (m,t) be the 
probability that the edge born at time s bridges a cluster 
with to vertices on the descendant side and another with 
remaining t + 1 — to vertices on the ancestor side. Due 
to the tree structure, the BC running through that edge 
born at s is given as g — 2m(t + 1 — to), independent of 
the birth time s. The probability P s (m,t) evolves as a 
new vertex attaches to one of the two clusters. The rate 
equation for this process is written as 

P s (m,t+1) = ri(m,t)P s (m,t)+r 2 (m-l,t)P s (m-l,t), 

(Al) 

where r\ (to, t) is the probability that a new vertex at- 
taches to the cluster with (t + 1 — m) vertices on the 
ancestor side, and riijn — l,t) with (to — 1) vertices on 
the descendant side. They are given explicitly as 

, ,\ ■, , ,s {2 + a)(t-m) + at + l . . . 

n(m,t) = l-r 2 m,i = ■ A2) 

It + a(t + 1) 

Since the amount of the BC on the edge s is independent 
of the birth time, we introduce P(m,t), 



P(m,t) = -yP s (m,t), 



t ^ 

s=l 



(A3) 



which is the probability for a certain edge to locate be- 
tween two clusters with to and t + l — m vertices averaged 
over its birth time. The BC on that edge i s st ill given 
by 2m(t + 1 — to). In terms of P(m,t), Eq.( |Al[ ) can be 
written as 



APPENDIX 

Here we present the analytic derivation of the BC distri- 
bution for a tree structure, however, the derivation is car- 
ried out for the edge BC rather than the vertex version 3 . 
The edge betweenness centrality is defined on edges as in 
Eq.(l), with the subscript k now denoting a bond. With- 
out any rigorous proof, we assume that the distributions 
of vertex BC and edge BC behave in the same manner 
particularly on tree structures, which is confirmed by nu- 
merical simulations. We have also checked the identity 



See Szabo et al. (xmd- mat/020327^ 2nd version) for a mean 
field treatment of the vertex BC problem on trees. 



(* + l)P(m, t + 1) = ri(m,t)tP(m,t) 

+r 2 (m - 1, t)tP(m - 1, i)(A4) 

In the limit of t — > 00, one may rewrite P( to, t ) in a scal- 
ing form, P(to,<) = V{m/t) and then Eq.(A4) is rewrit- 
ten as 

(t + l)V{x) - tV{x) ~ -x^^- - V{x) (A5) 

ax 

where x — m/t and the approximation V(x — 1/t) ~ 
V{x) — {l/t)dV{x)/dx has been used. From this we obtain 
that 



V(x) 



1 



(A6) 
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independent of the tuning parameter a. Using g = 2(t + 
1— m)m ~ 2t 2 x for large t and finite m, Eq. (|Aq) becomes 

Pb (g) ~ 4- ( A7 ) 

9 

Thus ?7 = 2 is obtained for the tree structure, indepen- 
dent of 7 > 2. General finite size scaling relations for 
Pb{9) are discussed in Ref . (pgj) . 
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Figure Legends 

Fig. 1 The BC distributions of real world networks. 

(a) Networks belonging to the class I: Co-authorship 
network (x), Core of PIN of yeast by Ito et al. (+), and 
metabolic network of EN (O). The solid line is a fitted 
line with a slope —2.2. (b) Networks belonging to the 
class II: WWW of nd.edu (O) and Internet AS as of 
October, 2001 (□). The solid line has a slope -2.0. 

Fig. 2 Comparison of the BC distributions for the two 
classes. 

BA model with m — 1, 2, and 3 are simulated for 
large system size, N = 3 x 10 5 , averaged over 10 
configurations. The dotted line has a slope —2.0 and the 
dashed one, —2.2. 

Fig. 3 The mass-distance relation M(d). 

(a) Core of PIN of yeast obtained by Ito et al.. (b) 
Metabolic networks of eukaryotes. Data are averaged 
over all 5 organisms in Ref.(R). Note that in this case 
we count only substrates for Ai(d). (c) Internet AS as 
of October, 2001. (d) WWW ofnd.edu. 

Fig. 4 Topology of the shortest pathways. 

(a) The metabolic network of EN (eukaryote) of length 
26. (b) The Internet AS of length 10. (c) The metabolic 
network of Methanococcus jannaschii (archae) of length 
20. (d) WWW of nd.edu of length 20. In (a) and 
(c), circles denote substrates and rectangles denote 
intermediate states. 

Fig. 5 The mass density. 

m(r; d) for EN with d — 26. Circles denote substrates 
and rectangles intermediate states. 

Fig. 6 Attack vulnerability of the scale-free networks. 

The WWW (r) = 2.0) (■) and the artificial directed SF 
network with r\ = 2.2 (□), the Internet (rj = 2.0) (•) and 
the artificial undirected SF network with rj = 2.2 (O): 
Changes in network diameter (a, c) and the relative size 



of the largest cluster (b, d) are shown as a function of /, FIG. 1 Goh et al. 
the fraction of removed vertices measured in percent (%). 



Table Legend 

Table 1 Natures of diverse SF networks. 

Tabulated for each network are the size N, the mean 
degree (fc), the degree exponent 7, and the betweenness 
centrality exponent r\. 
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FIG. 2 Goh et al. 
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FIG. 3 Goh et al. 



TABLE I Goh et al. 
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FIG. 4 Goh et al. 
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